Adjusting Content To User Profiles

ABSTRACT

One embodiment is a method that determines at a client computer a relevancy of information received with respect to a user profile. The method then adjusts a ranking of the information according to the relevancy and displays a selected portion of the adjusted information on the client computer.

BACKGROUND

Web-based communities and hosting services aim to facilitate sharing ofinformation to users. Web 2.0 offers various mechanisms for end users tosubscribe to news and other dynamic content, for instance RSS feeds.Many users cannot manually filter the quickly changing data receivedfrom all the potentially relevant sources. For one reason, too muchinformation exists to adequately filter. Information overflow due to alack of personalization (low relevance) has recently emerged as aproblem on both news portals and specific client-side reader software.

The problem of information overflow cannot be easily solved even ifusers formulate precise querying and information filtering criteria.Other obstacles still exist. In the news domain for instance,information depends on events in the outside world and users are nottrained to specify their interests in an appropriate formal way.Further, relevance of information often varies systematically over time.Even over short periods of time, relevant information changes as todifferent personal contexts a user encounters.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flow diagram for adjusting web content with user profiles inaccordance with an exemplary embodiment.

FIG. 2A is a block diagram of web content before being adjusted with auser profile in accordance with an exemplary embodiment.

FIG. 2B is a block diagram of web content after being adjusted with auser profile in accordance with an exemplary embodiment.

FIG. 3 is a block diagram of a computer for executing methods inaccordance with an exemplary embodiment.

DETAILED DESCRIPTION

Exemplary embodiments are directed to apparatus, systems, and methods toadjust web content based on a user profile. In one embodiment, behaviorof a user on a client computer is monitored and used to build a profileof the user. Thereafter, web pages, electronic documents, or moduleshaving a specific function (for example, RSS feeds) are altered orchanged according to relevancy of their content with respect to the userprofile.

In one embodiment, the profile of the user is built on the clientcomputer, not on a central or shared server. Further, assessment ofincoming web content is compared with the user profile and ranked at theclient computer. Performing these functions on the client computer, asopposed to a shared server, protects the personal information of theuser from being disseminated, reviewed by unauthorized third parties,misappropriated, and subjected to other privacy concerns.

Incoming information or web content is automatically changed based onthe user profile. For example, as an RSS feed retrieves headline newsinformation, the content 6f the news is compared and ranked with respectto the user profile. Textual similarities between the headline newsstories and the user profile are determined, and these similarities areused to change how the headline news stories are presented to a user ona personalized web page of the user. For instance, new stories having ahigher relevancy to the user profile are moved up or to the top of amodule on the web page. By contrast, news stories having a lowerrelevancy to the user profile are moved lower or even removed from beingdisplayed in the module. In this manner, the more relevant headline newsstories are presented first or given a higher hierarchical listing onthe web page of the user.

In one exemplary embodiment, a system residing on a local computermonitors behavior of the user. By way of example, the system monitorsall recently visited web pages, blogs, and RSS feed items to determinecurrent interests and information needs of the user. This information isused to build the user profile on the client computer. The same systemthen uses this detailed user profile to rank potentially relevantcontent fetched from the internet according to the estimated likelinessof relevance to the user profile.

Exemplary embodiments offer a personalized news and content platform toend users that offer a convenient access to the information that is mostrelevant to the user. Exemplary embodiments also automatically adapt topersonal preferences and interests without any need for the user to setup a profile manually or enter search terms. Maintaining the profile andranking on the local computer ensures privacy without disclosing anypersonal information of the user to third parties.

One embodiment uses a personalization engine that executes on the clientcomputer. The engine monitors relevant actions of a user and compilesthem into the user profile that is used for subsequent contentrecommendation. The term “recommendation” subsumes both the selection(or filtering) and ranking of articles. No details of user profiles haveto leave the local machine (client computer).

FIG. 1 is a flow diagram for adjusting web content with user profiles inaccordance with an exemplary embodiment. According to block 100, thebehavior of the user is observed. For example, user activity withrespect to web pages is observed.

According to block 110, information is extracted from the web pagesvisited by the user. Real-time information is collected and stored whilethe user navigates to different web pages. The content or actualinformation presented on the web pages is used to build the user orclient profile.

A system that monitors a user on a personal computer has access to allthe personal information stored on that machine. In one embodiment, amajor source of information to be used for assembling a profile on thepersonal computer of the user is the behavior of the user on the webpage being visited. To this end, specific web browser plug-ins installedon the client computer constantly collect relevant information, inparticular the HTML content of all pages visited by a user. Depending onthe computational resources that are available for the application, thebrowsing history of the user can then be analyzed using a technique ofappropriate complexity. A transformation of visited pages into a BOW(bags of words) representation is computationally quick and can beoperationalized as a service (process) that is constantly running in thebackground. This background process transforms and stores each web pageat the same time the web browser displays it to the user.

A more complex way to extract information for specific tasks isNamed-Entity Recognition. This technique allows information to bediscovered in web pages. For instance, the technique can find persons,city names, companies, and other entities in web pages. Such featuresallow embodiments to construct meaningful components in bag of wordsrepresentations.

According to block 120, information extracted from web pages is placedin a profile. This information is used to build the user profile.

In one exemplary embodiment, based on any BOW representation of a user'sweb history, different methods exist to compute a user profile. Ratherthan using raw BOW vectors, one exemplary format is an IR (informationretrieval) technique which is a term frequency-inverse documentfrequency (TFIDF) representation. The cosine of corresponding vectors inthis representation is used as a similarity measure for text documents.

Other exemplary embodiments support more complex features to beincorporated into the vectors, such as semantic annotations, meta-data,or named entities, and any better suited similarity measure betweendocuments. The following strategies allow exemplary embodiments to rankcandidate documents based on the computation of a personal profile.

In one embodiment, an efficient way to compute a profile from the webhistory TFIDF is to collapse this matrix to a single vector by computingthe average TFIDF vector of all documents as their arithmetic mean. Inthis representation, many terms resembling specific user interests canbe expected to occur more frequently than for the average user. Thesimilarity of the candidate documents to this profile vector is used asan efficient strategy for ranking the documents.

Another embodiment does not collapse the matrix to a single vector, butranks the candidate documents by their similarities to any page visitedby the user (nearest neighbor). This strategy gives higher weight to theoccurrence of rare specific terminology in both the candidate page aswell as the web history.

As another example, content from different sources will usually besemantically “typed.” For example, interest in “World news—Iraq” doesnot necessarily imply interest in “Travel deals—Iraq”. Depending on thedegree of structure and other characteristics the following threesolutions allow an exemplary embodiment to use only the relevant part ofthe web history to score each specific candidate document.

As a first solution, if the content to be selected and ranked isclassified or annotated appropriately, then the parts of the web historythat are useful to build a profile for selecting and ranking thecorresponding content are usually a subset of the full web history.Classifiers built offline and uploaded to the client are used to selectthese relevant web pages. This helps to build a separate categoryspecific profile based on the relevant content only.

As a second solution, even without any given taxonomy unsupervisedmachine learning techniques like clustering or (probabilistic), latentsemantic indexing allows an exemplary embodiment to form groups ofsimilar, and hence usually related candidate documents. These documentscan then be annotated by the group(s) to which they belong. By way ofexample, each group is characterized referring to terms that are morefrequently observed than in all other groups or on average. Thesecharacterizations are then used as classifiers to find the relevantparts of the web history to build a profile.

As a third solution, a similar approach is taken to classify web pageswith respect to the different available sources. Lazy learning schemeslike k-nearest neighbors allow an exemplary embodiment to classify theset of pages from the web history in terms of the different sources ofcandidate content. For each history page, the k most similar contentpages are retrieved. Based on the cumulated similarity scores andneighbors per source, each page is classified as relevant to none, one,or multiple sources. Again, for each source a separate profile is builtfrom the relevant pages, and it is used to select and rank candidatepages from that source by similarity to the profile.

As another example, timeliness of content that is presented to a user isutilized. The profile of a user can change over time. So recentlyvisited pages in the web history generally receive a higher weight thanolder pages. To account for these properties of the application, atimestamp of web pages is used to compute a decay factor that changesthe time-agnostic similarity scores. The decay factor of each documentor visited web page is a function of the time a page was visited or anarticle was published and the current time. A straightforward choice isto define a time after which the impact of a page halves, leading to anexponential decay. More complex schemes can be incorporated byconsidering more complex functions above. By way of example, suchschemas include accounting for the fact that behavior on weekends isdifferent from weekdays, and that the time of day can have a high impactas well.

As yet another example, profiles for reoccurring topics or activities onthe web are improved by retrieving only the set of related pages fromthe history. An appropriate similarity measure allows an exemplaryembodiment to introduce weights to the pages in the history thatresemble their relevance to the current topic or activity. The resultingshort-term profiles help to quickly switch between different contexts.Exemplary context include “working” and “shopping.”

As yet another example, any page a user visits can contain furtheruseful information, such as further RSS feeds that contain relevantinformation. When enough evidence for the usefulness of a new datasource has been collected, the user is prompted on whether or not thisfeed should be added to the list of monitored sources.

According to block 130, a user navigates to a personalized web page.FIG. 2A is a block diagram of web content of a personalized web page 200before being adjusted with a user profile in accordance with anexemplary embodiment.

The personalized web page 200 includes a query box 210 and a pluralityof modules 220-260. By way of example, these modules include, but arenot limited to, a sports module 220, a headline news module 225, a funand games module 230, a video module 235, a technology reviews module240, a clock module 245, a calendar module 250, a weather module 255,and a finance module 260. The number, type, size, format, and content ofthese modules are provided for illustration. One skilled in artappreciates that personalized web pages vary with each user andexemplary embodiments include the wide variety of such variations.

The modules provide automatic and real-time updates for informationcontent directed to the particular topic chosen by the user. Forexample, the RSS headline news module 225 receives syndicated audiofiles, images, text, and hyperlinks for headline news stories. Suchmedia modules can include other types of content and information, suchas film, video, TV, as well as provide additional metadata with themedia. Media RSS enables content publishers and bloggers to syndicatemultimedia content such as TV and video clips, movies, images, audio,etc.

By way of illustration, this module 225 includes five news stories(labeled Story 1 to Story 5). These stories are periodically orautomatically updated according to rules of the information provider andare listed in a predetermined hierarchy. The stories are not listed orpresented in a manner that is relevant to a particular user or aparticular client computer. In other words, information in the modulesis not presented with relevance to the user profile. Instead, popular ormost current news stories are listed at the top (i.e., Story 1) whileolder or less popular news stories are listed farther down the list(i.e., Story 5, Story 6, . . . Story N). Generally, as stories becomeolder in time, they simultaneously become less relevant and hence aremoved farther down the list. The specific order of information in themodules is determined by the web portal, web designer, host, or thelike.

The web page 200 is personalized since the user selects which and wheremodules are presented on the page. Typically, a user selects from one ormore different modules provided by the web page designer. By way ofexample, a user could select from different topics, such as news,sports, finance, weather, entertainment, travel, fashion, health, etc.This list is not intended to be exhaustive but rather illustrative ofchoices for topics in a personalized web page.

According to block 140 in FIG. 1, information is obtained from one ormore content providers or information sources, such as an RSS feed. Byway of example, a content recommender service runs as an application ona client computer with Internet access. The service prefetchespotentially relevant information from web pages, RSS feeds, and blogs,and can be extended for other kinds of dynamic web content.

In one exemplary embodiment, most of the fetched content is in textualform, or a textual description if available. The first step is toconvert the content into a format that allows for efficientrecommendations in real-time. Examples of such techniques includetechniques in fields like information retrieval (IR) and text miningthat represent textual documents as bags of words (BOW).

To be able to build upon this retrieval technique, content is parsed.HTML tags, scripts, and other irrelevant parts of HTML pages areremoved. Further, the title and description of individual RSS feed itemsare extracted and assembled into a plain text document. A standard IRprocedure is to tokenize the resulting plain text, remove stop words,use a stemming algorithm, and finally represent each HTML page as a termvector. Efficient retrieval of contents given a specific profile orquery can, for example, be achieved by using a full-text indexer.

In one embodiment, a more meaningful representation of the content isachieved by classifying each item in terms of conceptual knowledge.Depending on the kind of content, this can result in a single class perdocument or in a set of semantic tags per document that reflect thecontent. If the content source does not provide sufficient information,there is a natural way of annotating documents, which is to build a setof classifiers offline from a classified reference corpus. This strategyapplies to multi-class classification problems as well as to assigning afixed set of tags to documents. The classifiers are uploaded to theclient and used to decide which candidate document belongs to whichtopic or which candidate document should be associated with which tag.

According to block 150, the fetched information or RSS content iscompared with the user profile. By way of example, a relevancy, ranking,and/or score of the content are determined with respect to the userprofile.

According to block 160, a personalization engine is used to recommend orre-rank the fetched content (for example, RSS feeds, hyperlinks,information from blogs, web pages, etc.). Then, according to block 170,the re-ranked content is displayed to the user.

In one embodiment, recommended content is presented to users using alocally running web server. The overall system keeps track of whichlinks shown on the personalized local web sites the user has clicked on.The system also utilizes implicit feedback and other machine learningtechniques to continuously improve recommendations over time. The numberof items to select from each source resembles the following: i) theprevious interest in recommendations from this source, ii) the fractionof visited web pages classified as similar to this category or source,and iii) the similarity of the currently available articles to theprofile of this category.

In one embodiment, the re-ranked fetched content is displayed on thepersonalized web page of the user. FIG. 2B shows the personalized webpage 200 after adjusting or re-ranking one or more of the modules220-260.

By way of illustration, the headline news module 225 includes re-rankednew stories. These stories are automatically arranged and displayedaccording to the relevance to the user profile. A comparison of FIGS. 2Aand 2B reveals that some stories are added (Story 9 and Story 12), somestories are deleted (Story 2 and Story 4). Further, the stories arere-arranged in the listing of the module. In other words, the hierarchyor importance of stories is altered to present more relevant stories tothe profile of the user. Stories relevant to the particular profile ofthe user are listed first at the top of the module. For example, in FIG.2A, Story 3 was ranked third in importance. In FIG. 2B, Story 3 is nowranked first (i.e., being at the top of the list in the module). Story 1was previously presented first and is now ranked fifth.

FIG. 2B shows various examples in the modules where the specific orderof information in the modules is determined by the user profile and notthe web portal, web designer, host, or the like. The personalized webpage is thus automatically adjusted to display content based on the userprofile. Content more relevant to the user is presented, while contentless relevant is not presented at all or moved down on the hierarchy ofthe listing.

FIG. 3 is a block diagram of a client computer or electronic device 300in accordance with an exemplary embodiment of the present invention. Inone embodiment, the computer or electronic device includes memory 310,profile builder 320, personalization engine 325, display 330, processingunit 340, and one or more buses 350.

In one embodiment, the processor unit includes a processor (such as acentral processing unit, CPU, microprocessor, application-specificintegrated circuit (ASIC), etc.) for controlling the overall operationof memory 310 (such as random access memory (RAM) for temporary datastorage, read only memory (ROM) for permanent data storage, andfirmware). The processing unit 340 communicates with memory 310, profilebuilder 320, and personalization engine 325 via one or more buses 350and performs operations and tasks necessary to build a user profile andadjust a personalized web page of the user according to the userprofile. The memory 310, for example, stores applications, data,programs, algorithms (including software to implement or assist inimplementing embodiments in accordance with the present invention) andother data.

Exemplary embodiments provide a system that automatically establishes apersonalized content recommendation engine, without requiring end usersto provide any kind of configuration. The system can utilize anyinformation found on a local or client computing or electronic device tobuild a profile. Embodiments include full web usage history of the user.This usage goes beyond click-streams in that the page content itself canbe continuously stored and indexed at negligible additional costs. Thiseven holds for content retrieved via secure HTTP. Additional informationsources, like local email contacts and emails can also be utilized asrequired.

In one exemplary embodiment, one or more blocks or steps discussedherein are automated. In other words, apparatus, systems, and methodsoccur automatically. As used herein, the terms “automated” or“automatically” (and like variations thereof) mean controlled operationof an apparatus, system, and/or process using computers and/ormechanical/electrical devices without the necessity of humanintervention, observation, effort and/or decision.

As used herein, the term “webpage” or “web page” means a resource ofinformation that is suitable for the World Wide Web (WWW or web) and canbe accessed through a web browser. This information is usually in HTML(Hyper Text Markup Language) or XHTML (Extensible Hyper Text MarkupLanguage) format, and may provide navigation to other web pages viahypertext links. Web pages are a type of electronic or web document andinclude files of stored text. In one exemplary embodiment, a web page isa document on the World Wide Web (WWW) and identified with a UniformResource Locator (URL).

Exemplary embodiments are not limited to web pages, but also includeother electronic media and electronic documents. As used herein, an“electronic document” is electronic media content that are intended tobe used in electronic form.

As used herein, the term “RSS” means a type of Web feed formats used topublish frequently updated content such as blog entries, news headlines,podcasts, and other information. An RSS document, which is called a“feed,” “web feed,” or “channel,” contains either a summary of contentfrom an associated web site or the full text. With RSS feeds to a webpage or electronic documents, users receive current information fromtheir favorite web sites in an automated manner that is easier thanmanually retrieving such information.

As used herein, the term “relevancy” or “relevant” means havingsignificant and demonstratable bearing on a topic, issue, or matter athand.

As used herein, the term “client computer” means a personal computer oruser workstation that connects to a server to perform a function.Further, as used herein, the term “link” or “hyperlink” means areference or element in an electronic document that links to anotherplace in the same document or to an entirely different document.

The methods in accordance with exemplary embodiments of the presentinvention are provided as examples and should not be construed to limitother embodiments within the scope of the invention. For instance,blocks in diagrams or numbers (such as (1), (2), etc.) should not beconstrued as steps that must proceed in a particular order. Additionalblocks/steps may be added, some blocks/steps removed, or the order ofthe blocks/steps altered and still be within the scope of the invention.Further, methods or steps discussed within different figures can beadded to or exchanged with methods of steps in other figures. Furtheryet, specific numerical data values (such as specific quantities,numbers, categories, etc.) or other specific information should beinterpreted as illustrative for discussing exemplary embodiments. Suchspecific information is not provided to limit the invention.

In the various embodiments in accordance with the present invention,embodiments are implemented as a method, system, and/or apparatus. Asone example, exemplary embodiments and steps associated therewith areimplemented as one or more computer software programs to implement themethods described herein. The software is implemented as one or moremodules (also referred to as code subroutines, or “objects” inobject-oriented programming). The location of the software will differfor the various alternative embodiments. The software programming code,for example, is accessed by a processor or processors of the computer orserver from long-term storage media of some type, such as a CD-ROM driveor hard drive. The software programming code is embodied or stored onany of a variety of known media for use with a data processing system orin any memory device such as semiconductor, magnetic and opticaldevices, including a disk, hard drive, CD-ROM, ROM, etc. The code isdistributed on such media, or is distributed to users from the memory orstorage of one computer system over a network of some type to othercomputer systems for use by users of such other systems. Alternatively,the programming code is embodied in the memory and accessed by theprocessor using the bus. The techniques and methods for embodyingsoftware programming code in memory, on physical media, and/ordistributing software code via networks are well known and will not befurther discussed herein.

The above discussion is meant to be illustrative of the principles andvarious embodiments of the present invention. Numerous variations andmodifications will become apparent to those skilled in the art once theabove disclosure is fully appreciated. It is intended that the followingclaims be interpreted to embrace all such variations and modifications.

1) A method, comprising: receiving information on a client computer froma server; determining at the client computer a relevancy of theinformation with respect to a user profile; adjusting a ranking of theinformation according to the relevancy; and displaying a selectedportion of the information on the client computer, based on the adjustedranking. 2) The method of claim 1, wherein the information includesinformation from an RSS feed. 3) The method of claim 1 furthercomprising, wherein the selected portion of the information is displayedon a personalized web page at the client computer. 4) The method ofclaim 1 further comprising, rearranging a hierarchical order of theinformation with respect to the user profile. 5) The method of claim 1further comprising, arranging RSS feeds in a module on a personalizedweb page based on the ranking of the information according to therelevancy. 6) The method of claim 1 further comprising: adding RSS feedsdisplayed on a personalized web page based on the ranking of theinformation according to the relevancy; removing RSS feeds displayed onthe personalized web page based on the ranking of the informationaccording to the relevancy; and re-ranking RSS feeds displayed on thepersonalized web page based on the ranking of the information accordingto the relevancy. 7) The method of claim 1 further comprising, listinghyperlinks to stories in a hierarchy based on how relevant the storiesare to the user profile. 8) A tangible computer readable medium havinginstructions for causing a computer to execute a method, comprising:building a profile of a user on a client computer; receiving informationfrom an RSS feed on the client computer; ranking the information withrespect to the user profile; and displaying the information of the useraccording to the ranking. 9) The computer readable medium of claim 8further comprising: prefetching web content that includes web pages, RSSfeeds, and information from blogs; converting the web content in textualdocuments; and determining a relevancy between the textual documents andthe profile. 10) The computer readable medium of claim 8 furthercomprising, using web browser plug-ins installed on the client computerto collect information to build the profile. 11) The computer readablemedium of claim 8 further comprising, analyzing a web browsing historyof the user on the client computer to build the profile. 12) Thecomputer readable medium of claim 8 further comprising, extractingcontent from web pages visited by the user on the client computer tobuild the profile. 13) The computer readable medium of claim 8 furthercomprising: computing an average TFIDF (term frequency inverse documentfrequency) vector of documents from web pages visited by the user; andranking the documents according to the profile. 14) The computerreadable medium of claim 8 further comprising, using time to assign ahigher weight to web pages more recently visited by the user. 15) Thecomputer readable medium of claim 8 further comprising, using atimestamp of web pages visited by the user to compute a decay factorthat changes scores assigned to the web pages. 16) A computer,comprising: a memory storing an algorithm; and processor to execute thealgorithm to: determining at a client computer a relevancy of RSS feedswith respect to a user profile built on the client computer; ranking theRSS feeds with respect to the relevancy; and displaying the RSS feeds onan electronic device in accordance with the relevancy. 17) The computerof claim 16, wherein the processor further executes the algorithm totrack which hyperlinks a user clicks on to perform adjustments to theuser profile. 18) The computer of claim 16, wherein the processorfurther executes the algorithm to analyze a web browsing history storedon the client computer to build the user profile. 19) The computer ofclaim 16, wherein the processor further executes the algorithm tocontinuously store on the client computer content of web pages visitedby a user so as to update the user profile. 20) The computer of claim16, wherein the processor further executes the algorithm to rearrange ahierarchical order of the RSS feeds with respect to the user profile,and the RSS feeds are displayed on a personalized web page on the clientcomputer.